AITopics | quadratic activation function

Collaborating Authors

quadratic activation function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

9b8b50fb590c590ffbf1295ce92258dc-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 12:55:21 GMT

The problem of learning the parameters of a neural network is two-fold. First, we want that their training on a set of data via minimization of a suitable loss function succeed in finding a set of parameters for which the value of the loss is close to its global minimum.

artificial intelligence, machine learning, neural network, (16 more...)

Neural Information Processing Systems

Country:

Europe > France (0.05)
North America > United States > California (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Switzerland (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Neural Information Processing SystemsDec-24-2025, 08:43:40 GMT

We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the overparametrized regime where the layer width m is larger than the input dimension d. We consider a teacher-student scenario where the teacher has the same structure as the student with a hidden layer of smaller width m*<=m. We describe how the empirical loss landscape is affected by the number n of data samples and the width m* of the teacher network. In particular we determine how the probability that there be no spurious minima on the empirical loss depends on n, d, and m*, thereby establishing conditions under which the neural network can in principle recover the teacher. We also show that under the same conditions gradient descent dynamics on the empirical loss converges and leads to small generalization error, i.e. it enables recovery in practice. Finally we characterize the time-convergence rate of gradient descent in the limit of a large number of samples. These results are confirmed by numerical experiments.

optimization and generalization, quadratic activation function, shallow neural network, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Review for NeurIPS paper: Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Neural Information Processing SystemsFeb-11-2025, 23:16:14 GMT

Reviews for this paper are mitigated, in particular some reviewers were concerned about some missing proofs. On the other hand, the paper studies an important problem and carries a nice analysis that integrates numerical experiments, heuristic derivations and rigorous proofs in a meaningful way; and the reader learns a lot about such models (quadratic 2-layer networks with sparse teacher). It is thus necessary that the authors spend a lot of effort writing the missing proofs thoroughly because it will not be possible to review those proofs again (and of course all the other changes proposed in the rebuttal should be implemented). Overall, for such a paper that contains true statements, conjectures and heuristics, it is very important to emphasize on the "truth status" of each statement, and "true statements" should have a proof.

optimization and generalization, quadratic activation function, shallow neural network, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

Review for NeurIPS paper: Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Neural Information Processing SystemsJan-26-2025, 23:38:02 GMT

For random initialization, I also believe that it still needs a lot of effort. The upper bound of E(A(t)) is clearly dependent on the condition number of A(0) instead of simply dividing the cases into full-rank and rank-deficient. Moreover, rather than only focusing on the full-rank case, the author may consider the problem uniformly and continuously, e.g., the MP-law from RMT may help to provide an asymptotic analysis for the random initialization since the universal distribution for the eigenvalues are given. Also, there may exist the non-asymptotic version, but more perturbation bounds are needed. BTW, due to my research background, I neglected the development of shallow neural networks with random Gaussian input. I am sorry about that and raise my score.

optimization and generalization, quadratic activation function, shallow neural network, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Neural Information Processing SystemsOct-10-2024, 21:58:11 GMT

We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the overparametrized regime where the layer width m is larger than the input dimension d. We consider a teacher-student scenario where the teacher has the same structure as the student with a hidden layer of smaller width m* m. We describe how the empirical loss landscape is affected by the number n of data samples and the width m* of the teacher network. In particular we determine how the probability that there be no spurious minima on the empirical loss depends on n, d, and m*, thereby establishing conditions under which the neural network can in principle recover the teacher. We also show that under the same conditions gradient descent dynamics on the empirical loss converges and leads to small generalization error, i.e. it enables recovery in practice.

optimization and generalization, quadratic activation function, shallow neural network, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Nonconvex sparse regularization for deep neural networks and its optimality

Ohn, Ilsang, Kim, Yongdai

arXiv.org Machine LearningMar-26-2020

Recent theoretical studies proved that deep neural network (DNN) estimators obtained by minimizing empirical risk with a certain sparsity constraint can attain optimal convergence rates for regression and classification problems. However, the sparsity constraint requires to know certain properties of the true model, which are not available in practice. Moreover, computation is difficult due to the discrete nature of the sparsity constraint. In this paper, we propose a novel penalized estimation method for sparse DNNs, which resolves the aforementioned problems existing in the sparsity constraint. We establish an oracle inequality for the excess risk of the proposed sparse-penalized DNN estimator and derive convergence rates for several learning tasks. In particular, we prove that the sparse-penalized estimator can adaptively attain minimax convergence rates for various nonparametric regression problems. For computation, we develop an efficient gradient-based optimization algorithm that guarantees the monotonic reduction of the objective function.

activation function, estimator, sparse-penalized dnn estimator, (12 more...)

arXiv.org Machine Learning

2003.11769

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.83)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Smooth function approximation by deep neural networks with general activation functions

Ohn, Ilsang, Kim, Yongdai

arXiv.org Machine LearningJun-17-2019

There has been a growing interest in expressivity of deep neural networks. But most of existing work about this topic focus only on the specific activation function such as ReLU or sigmoid. In this paper, we investigate the approximation ability of deep neural networks with a quite general class of activation functions. This class of activation functions includes most of frequently used activation functions. We derive the required depth, width and sparsity of a deep neural network to approximate any H\"older smooth function upto a given approximation error for the large class of activation functions. Based on our approximation error analysis, we derive the minimax optimality of the deep neural network estimators with the general activation functions in both regression and classification problems.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1906.06903

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback